Speeding up XML querying: satisfiability test & containment test of XPath queries in the presence of XML schema definitions
نویسنده
چکیده
This dissertation develops approaches to testing the satisfiability and the containment of XPath queries in the presence of XML Schema definitions in order to speed up XML querying. XML provides a simple yet powerful mechanism for information storage, processing and delivery, and is a widely used standard data format. XPath is a basic language for querying XML data, and is embedded into many W3C standards, e.g. XQuery, XLink, XML Schema, XForm and Schematron, for addressing XML data. Therefore, XPath optimization plays a key role in speeding up XML query processing. The satisfiablity test and containment test of XPath are two important issues in XPath optimization. An unsatisifable XPath query selects every time an empty result. Therefore, the application of the satisfiability test can avoid the unnessesary submission and the unnecessary evaluation of unsatisfiable queries, and thus can save querying costs. In programming languages, which embed XPath, like XOBE [Kempa and Linnemann 2003a], the satisfiability test can enable an efficient development of more robust applications by avoiding extensive tests and runtime failures caused by unsatisfiable queries. The satisfiability test can also speed up the execution of codes by the pre-computation of an empty result at compile time. Furthermore, the XPath satisfiability test plays an important role in other applications, e.g. XML access control [Fan et al. 2004], type-checking of transformations [Martens and Neven 2004] and XPath-based index update [Hammerschmidt et al. 2005]. The containment of XPath is another key factor for XPath evaluation. XPath containment can be used to minimize XPath expressions to speed up query evaluation. When using views to answer queries, the containment test is the underlying technique to decide if a new query can be answered using the results of previous queries. Using views to answer queries can significantly improve the performance of XPath processing, and reduce the communication and query costs by significantly decreasing shipped data, since part of query evaluation has bee done when computing the cache, and since the partial or even the whole answer to the new query is already available at client side. XPath containment can also find its applications in inferring the keys of XML Schema and in testing the satisfiability of XPath queries. Since the high complexity of XPath queries, it is not trivial to develop efficient approaches to checking XPath satisifiability and to checking XPath containment when schemas, especially recursive schemas, are in presence. [Choi 2002] shows that recursive schemas are often used in the real world. The existing solutions to XPath satisfiability consider only some subsets of XPath axes and non recursive schemas. In this thesis, we propose an approach to XPath satisfiability in the presence of XML Schema definitions, and support all XPath axes, and recursive as well as non-recursive schemas. Since XPath containment has a high complexity under constraints, there is lack of work on practical solutions to this issue. In this work, we develop an approach to checking XPath containment under constraints of XML Schema definitions. Furthermore, we develop a data model for XML Schema and an XPathXSchema evaluator based on the data model. We as well suggest an approach to rewriting and optimization of XPath expressions according to schemas. Our XPath-XSchema evluator evaluates XPath queries on an XML Schema definition, in order to check satisfiability and containment of XPath expressions with respect to the schema. We present a complexity analysis of our XPathXSchema evalutor, which proves that our approach is efficient at typical cases. We present an experimental analysis of our satisfiability tester, which proves the optimization potential of avoiding the evaluation of unsatisfiable queries. We prove the correctness of our approach to XPath containment, and analyze the complexity of our approach. We develop a prototype of our containment tester and the experimental results show the efficiency of our approach.
منابع مشابه
Speeding up Xml Querying
This dissertation develops approaches to testing the satisfiability and the containment of XPath queries in the presence of XML Schema definitions in order to speed up XML querying. XML provides a simple yet powerful mechanism for information storage, processing and delivery, and is a widely used standard data format. XPath is a basic language for querying XML data, and is embedded into many W3...
متن کاملXPath Query Satisfiability and Containment under DTD Constraints
In this thesis, we consider the XML query language XPath, along with XML documents whose integrity constraints are presented in the form of document type definitions (DTDs). In particular, we study the problems of XPath satisfiability and XPath containment in the presence of DTDs. The motivation for studying XPath is that it is the main language for navigating in and extracting information from...
متن کاملFiltering Unsatisfiable XPATH Queries
The satisfiability test checks, whether or not the evaluation of a query returns the empty set for any input document, and can be used in query optimization for avoiding the submission and the computation of unsatisfiable queries. Thus, applying the satisfiability test before executing a query can save processing time and query costs. We focus on the satisfiability problem for queries formulate...
متن کاملA DTD Graph Based XPath Query Subsumption Test
XPath expressions play a central role in querying for XML fragments. We present a containment test of two XPath queries which checks whether a new XPath query XP1 can reuse a previous query result XP2. The key idea is to transform XP1 into a graph which is used to search for sequences of elements which are used in the XPath query XP2.
متن کاملContainment of Nested XML Queries
Query containment is the most fundamental relationship between a pair of database queries: a query Q is said to be contained in a query Q if the answer for Q is always a subset of the answer for Q, independent of the current state of the database. Query containment is an important problem in a wide variety of data management applications, including verification of integrity constraints, reasoni...
متن کامل